Privacy-Preserving Bayesian Network Learning From Heterogeneous Distributed Data
نویسندگان
چکیده
In this paper, we propose a post randomization technique to learn a Bayesian network (BN) from distributed heterogeneous data, in a privacy sensitive fashion. In this case, two or more parties own sensitive data but want to learn a Bayesian network from the combined data. We consider both structure and parameter learning for the BN. The only required information from the data set is a set of sufficient statistics for learning both network structure and parameters. The proposed method estimates the sufficient statistics from the randomized data. The estimated sufficient statistics are then used to learn a BN. For structure learning, we face the familiar extra-link problem since estimation errors tend to break the conditional independence among the variables. We propose modifications of score functions used for BN learning, to solve this problem. We show both theoretically and experimentally that post randomization is an efficient, flexible, and easy-to-use method to learn Bayesian network from privacy sensitive data.
منابع مشابه
Distributed Data Mining Protocols for Privacy: A Review of Some Recent Results
With the rapid advance of the Internet, a large amount of sensitive data is collected, stored, and processed by different parties. Data mining is a powerful tool that can extract knowledge from large amounts of data. Generally, data mining requires that data be collected into a central site. However, privacy concerns may prevent different parties from sharing their data with others. Cryptograph...
متن کاملPrivacy-Preserving Data Mining Algorithm Quantum Ant Colony Optimization
Bayesian network has been used extensively in data mining. The Privacy-Preserving data mining algorithm based on quantum ant colony optimization is proposed in this paper. The algorithm is based on distributed database. The algorithm is divided into two steps. In the first step, the modified quantum ant colony optimization algorithm is used to get the local Bayesian network structure. The purpo...
متن کاملPrivacy-Preserving Incremental Bayesian Network Learning
Bayesian Networks (BNs) have received significant attention in various academic and industrial applications, such as modeling knowledge in image processing, engineering, medicine and bio-informatics. Preserving the privacy of sensitive data, owned by different parties, is often a critical issue. However, in many practical applications, BNs must train from data that gradually becomes available a...
متن کاملIncremental learning of privacy-preserving Bayesian networks
Bayesian Networks (BNs) have received significant attention in various academic and industrial applications, such as modeling knowledge in image processing, engineering, medicine and bio-informatics. Preserving the privacy of sensitive data, owned by different parties, is often a critical issue. However, in many practical applications, BNs must train from data that gradually becomes available a...
متن کاملPrivacy Preserving Association Rule Mining in Vertically Partitioned Data
Data mining technology has emerged as a means for identifying patterns and trends from large quantities of data. This paper presents privacy preserving association rule mining across vertically partitioned data. We present an efficient algorithm to discover association rules with minimum levels of support and confidence, from heterogeneous data distributed across 2 parties, while preventing eit...
متن کامل